Picture for Jiaxuan Liu

Jiaxuan Liu

ERNIE 5.0 Technical Report

Add code
Feb 04, 2026
Viaarxiv icon

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

Add code
Jan 29, 2026
Viaarxiv icon

FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes

Add code
Jan 21, 2026
Viaarxiv icon

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Add code
Oct 16, 2025
Viaarxiv icon

UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech

Add code
May 15, 2025
Viaarxiv icon

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles

Add code
Dec 04, 2024
Figure 1 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Figure 2 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Figure 3 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Figure 4 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Viaarxiv icon

AppAgent: Multimodal Agents as Smartphone Users

Add code
Dec 22, 2023
Figure 1 for AppAgent: Multimodal Agents as Smartphone Users
Figure 2 for AppAgent: Multimodal Agents as Smartphone Users
Figure 3 for AppAgent: Multimodal Agents as Smartphone Users
Figure 4 for AppAgent: Multimodal Agents as Smartphone Users
Viaarxiv icon

DroidBot-GPT: GPT-powered UI Automation for Android

Add code
Apr 14, 2023
Viaarxiv icon